6 research outputs found
On the Role of Optimization in Double Descent: A Least Squares Study
Empirically it has been observed that the performance of deep neural networks
steadily improves as we increase model size, contradicting the classical view
on overfitting and generalization. Recently, the double descent phenomena has
been proposed to reconcile this observation with theory, suggesting that the
test error has a second descent when the model becomes sufficiently
overparameterized, as the model size itself acts as an implicit regularizer. In
this paper we add to the growing body of work in this space, providing a
careful study of learning dynamics as a function of model size for the least
squares scenario. We show an excess risk bound for the gradient descent
solution of the least squares objective. The bound depends on the smallest
non-zero eigenvalue of the covariance matrix of the input features, via a
functional form that has the double descent behavior. This gives a new
perspective on the double descent curves reported in the literature. Our
analysis of the excess risk allows to decouple the effect of optimization and
generalization error. In particular, we find that in case of noiseless
regression, double descent is explained solely by optimization-related
quantities, which was missed in studies focusing on the Moore-Penrose
pseudoinverse solution. We believe that our derivation provides an alternative
view compared to existing work, shedding some light on a possible cause of this
phenomena, at least in the considered least squares setting. We empirically
explore if our predictions hold for neural networks, in particular whether the
covariance of intermediary hidden activations has a similar behavior as the one
predicted by our derivations
PAC-Bayes analysis beyond the usual bounds
We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed ‘data-free’ priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss
Tighter risk certificates for neural networks
This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of using the whole data set for learning a predictor and certifying its risk on any unseen data (from the same distribution as the training data) potentially without the need for holding out test data
PAC-Bayes bounds for stable algorithms with instance-dependent priors
PAC-Bayes bounds have been proposed to get risk estimates based on a training
sample. In this paper the PAC-Bayes approach is combined with stability of the
hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting is
used with a Gaussian prior centered at the expected output. Thus a novelty of our
paper is using priors defined in terms of the data-generating distribution. Our main
result estimates the risk of the randomized algorithm in terms of the hypothesis
stability coefficients. We also provide a new bound for the SVM classifier, which
is compared to other known bounds experimentally. Ours appears to be the first
uniform hypothesis stability-based bound that evaluates to non-trivial values
Recommended from our members
Efficacy and safety of two neutralising monoclonal antibody therapies, sotrovimab and BRII-196 plus BRII-198, for adults hospitalised with COVID-19 (TICO): a randomised controlled trial
We aimed to assess the efficacy and safety of two neutralising monoclonal antibody therapies (sotrovimab [Vir Biotechnology and GlaxoSmithKline] and BRII-196 plus BRII-198 [Brii Biosciences]) for adults admitted to hospital for COVID-19 (hereafter referred to as hospitalised) with COVID-19.
In this multinational, double-blind, randomised, placebo-controlled, clinical trial (Therapeutics for Inpatients with COVID-19 [TICO]), adults (aged ≥18 years) hospitalised with COVID-19 at 43 hospitals in the USA, Denmark, Switzerland, and Poland were recruited. Patients were eligible if they had laboratory-confirmed SARS-CoV-2 infection and COVID-19 symptoms for up to 12 days. Using a web-based application, participants were randomly assigned (2:1:2:1), stratified by trial site pharmacy, to sotrovimab 500 mg, matching placebo for sotrovimab, BRII-196 1000 mg plus BRII-198 1000 mg, or matching placebo for BRII-196 plus BRII-198, in addition to standard of care. Each study product was administered as a single dose given intravenously over 60 min. The concurrent placebo groups were pooled for analyses. The primary outcome was time to sustained clinical recovery, defined as discharge from the hospital to home and remaining at home for 14 consecutive days, up to day 90 after randomisation. Interim futility analyses were based on two seven-category ordinal outcome scales on day 5 that measured pulmonary status and extrapulmonary complications of COVID-19. The safety outcome was a composite of death, serious adverse events, incident organ failure, and serious coinfection up to day 90 after randomisation. Efficacy and safety outcomes were assessed in the modified intention-to-treat population, defined as all patients randomly assigned to treatment who started the study infusion. This study is registered with ClinicalTrials.gov, NCT04501978.
Between Dec 16, 2020, and March 1, 2021, 546 patients were enrolled and randomly assigned to sotrovimab (n=184), BRII-196 plus BRII-198 (n=183), or placebo (n=179), of whom 536 received part or all of their assigned study drug (sotrovimab n=182, BRII-196 plus BRII-198 n=176, or placebo n=178; median age of 60 years [IQR 50–72], 228 [43%] patients were female and 308 [57%] were male). At this point, enrolment was halted on the basis of the interim futility analysis. At day 5, neither the sotrovimab group nor the BRII-196 plus BRII-198 group had significantly higher odds of more favourable outcomes than the placebo group on either the pulmonary scale (adjusted odds ratio sotrovimab 1·07 [95% CI 0·74–1·56]; BRII-196 plus BRII-198 0·98 [95% CI 0·67–1·43]) or the pulmonary-plus complications scale (sotrovimab 1·08 [0·74–1·58]; BRII-196 plus BRII-198 1·00 [0·68–1·46]). By day 90, sustained clinical recovery was seen in 151 (85%) patients in the placebo group compared with 160 (88%) in the sotrovimab group (adjusted rate ratio 1·12 [95% CI 0·91–1·37]) and 155 (88%) in the BRII-196 plus BRII-198 group (1·08 [0·88–1·32]). The composite safety outcome up to day 90 was met by 48 (27%) patients in the placebo group, 42 (23%) in the sotrovimab group, and 45 (26%) in the BRII-196 plus BRII-198 group. 13 (7%) patients in the placebo group, 14 (8%) in the sotrovimab group, and 15 (9%) in the BRII-196 plus BRII-198 group died up to day 90.
Neither sotrovimab nor BRII-196 plus BRII-198 showed efficacy for improving clinical outcomes among adults hospitalised with COVID-19.
US National Institutes of Health and Operation Warp Spee